Processing of multi-channel signals

ABSTRACT

A method of generating a monaural signal (S) comprising a combination of at least two input audio channels (L, R) is disclosed. Corresponding frequency components from respective frequency spectrum representations for each audio channel (L(k), R(k)) are summed ( 46 ) to provide a set of summed frequency components (S(k)) for each sequential segment. For each frequency band (i) of each of sequential segment, a correction factor (m(i)) is calculated ( 45 ) as function of a sum of energy of the frequency components of the summed signal in the band formula (I) and a sum of components of the input audio channels in the band formula (II). Each summed frequency component is corrected ( 47 ) as a function of the correction factor (m(i)) for the frequency band of said component.

The present invention relates to the processing of audio signals and,more particularly, the coding of multi-channel audio signals.

Parametric multi-channel audio coders generally transmit only onefull-bandwidth audio channel combined with a set of parameters thatdescribe the spatial properties of an input signal. For example, FIG. 1shows the steps performed in an encoder 10 described in European PatentApplication No. 02079817.9 filed Nov. 20, 2002 (Attorney Docket No.PHNL021156).

In an initial step S1, input signals L and R are split into subbands101, for example by time-windowing followed by a transform operation.Subsequently, in step S2, the level difference (ILD) of correspondingsubband signals is determined; in step S3 the time difference (ITD orIPD) of corresponding subband signals is determined; and in step S4 theamount of similarity or dissimilarity of the waveforms which cannot beaccounted for by ILDs or ITDs, is described. In the subsequent steps S5,S6, and S7, the determined parameters are quantized.

In step S8, a monaural signal S is generated from the incoming audiosignals and finally, in step S9, a coded signal 102 is generated fromthe monaural signal and the determined spatial parameters.

FIG. 2 shows a schematic block diagram of a coding system comprising theencoder 10 and a corresponding decoder 202. The coded signal 102comprising the sum signal S and spatial parameters P is communicated toa decoder 202. The signal 102 may be communicated via any suitablecommunications channel 204. Alternatively or additionally, the signalmay be stored on a removable storage medium 214, which may betransferred from the encoder to the decoder.

Synthesis (in the decoder 202) is performed by applying the spatialparameters to the sum signal to generate left and right output signals.Hence, the decoder 202 comprises a decoding module 210 which performsthe inverse operation of step S9 and extracts the sum signal S and theparameters P from the coded signal 102. The decoder further comprises asynthesis module 211 which recovers the stereo components L and R fromthe sum (or dominant) signal and the spatial parameters.

One of the challenges is to generate the monaural signal S, step S8, insuch a way that, on decoding into the output channels, the perceivedsound timbre is exactly the same as for the input channels.

Several methods of generating this sum signal have been suggestedpreviously. In general these compose a mono signal as a linearcombination of the input signals. Particular techniques include:

-   1. Simple summation of the input signals. See for example ‘Efficient    representation of spatial audio using perceptual parametrization’,    by C. Faller and F. Baumgarte, WASPAA'01, Workshop on applications    of signal processing on audio and acoustics, New Paltz, New York,    2001.-   2. Weighted summation of the input signals using principle component    analysis (PCA). See for example European Patent Application No.    02076408.0 filed Apr. 10, 2002 (Attorney Docket No. PHNL020284) and    European Patent Application No. 02076410.6 filed Apr. 10, 2002    (Attorney Docket No. PHNL020283). In this scheme, the squared    weights of the summation sum up to one and the actual values depend    on the relative energies in the input signals.-   3. Weighted summation with weights depending on the time-domain    correlation between the input signals. See for example ‘Joint stereo    coding of audio signals’, by D. Sinha, European patent application    EP 1 107 232 A2. In this method, the weights sum to +1, while the    actual values depend on the cross-correlation of the input channels.-   4. U.S. Pat. No. 5,701,346, Herre et al discloses weighted summation    with energy-preservation scaling for downmixing left, right, and    center channels of wideband signals. However, this is not performed    as a function of frequency.

These methods can be applied to the full-bandwidth signal or can beapplied on band-filtered signals which all have their own weights foreach frequency band. However, all methods described have one drawback.If the cross-correlation is frequency-dependent, which is very often thecase for stereo recordings, coloration (i.e., a change of the perceivedtimbre) of the sound of the decoder occurs.

This can be explained as follows: For a frequency band that has across-correlation of +1, linear summation of two input signals resultsin a linear addition of the signal amplitudes and squaring the additivesignal to determine the resultant energy. (For two in-phase signals ofequal amplitude, this results in a doubling of amplitude with aquadrupling of energy.) If the cross-correlation is 0, linear summationresults in less than a doubling of the amplitude and a quadrupling ofthe energy. Furthermore, if the cross-correlation for a certainfrequency band amounts −1, the signal components of that frequency bandcancel out and no signal remains. Hence for simple summation, thefrequency bands of the sum signal can have an energy (power) between 0and four times the power of the two input signals, depending on therelative levels and the cross-correlation of the input signals.

The present invention attempts to mitigate this problem and provides amethod according to claim 1.

If different frequency bands tended to on average have the samecorrelation, then one might expect that over time distortion caused bysuch summation would average out over the frequency spectrum. However,it has been recognised that, in multi-channel signals, low frequencycomponents tend to be more correlated than high frequency components.Therefore, it will be seen that without the present invention,summation, which does not take into account frequency dependentcorrelation of channels, would tend to unduly boost the energy levels ofmore highly correlated and, in particular, psycho-acoustically sensitivelow frequency bands.

The present invention provides a frequency-dependent correction of themono signal where the correction factor depends on a frequency-dependentcross-correlation and relative levels of the input signals. This methodreduces spectral coloration artefacts which are introduced by knownsummation methods and ensures energy preservation in each frequencyband.

The frequency-dependent correction can be applied by first summing theinput signals (either summed linear or weighted) followed by applying acorrection filter, or by releasing the constraint that the weights forsummation (or their squared values) necessarily sum up to +1 but sum toa value that depends on the cross-correlation.

It should be noted that although the invention can be applied to anysystem where two or more two input channels are combined.

Embodiments of the invention will now be described with reference to theaccompanying drawings, in which:

FIG. 1 shows a prior art encoder;

FIG. 2 shows a block diagram of an audio system including the encoder ofFIG. 1;

FIG. 3 shows the steps performed by a signal summation component of anaudio coder according to a first embodiment of the invention; and

FIG. 4 shows linear interpolation of the correction factors m(i) appliedby the summation component of FIG. 3.

According to the present invention, there is provided an improved signalsummation component (S8′), in particular for performing the stepcorresponding to S8 of FIG. 1. Nonetheless, it will be seen that theinvention is applicable anywhere two or more signals need to be summed.In a first embodiment of the invention, the summation component addsleft and right stereo channel signals prior to the summed signal S beingencoded, step S9.

Referring now to FIG. 3, in the first embodiment, the left (L) and right(R) channel signals provided to the summation component comprisemulti-channel segments m1, m2 . . . overlapping in successive timeframes t(n−1), t(n), t (n+1). Typically sinusoids, are updated at a rateof 10 ms and each segment m1, m2 . . . is twice the length of the updaterate, i.e. 20 ms.

For each overlapping time window t(n−1),t(n),t(n+1) for which the L,Rchannel signals are to be summed, the summation component uses a(square-root) Hanning window function to combine each channel signalfrom overlapping segments m1,m2 . . . into a respective time-domainsignal representing each channel for a time window, step 42.

An FFT (Fast Fourier Transform) is applied on each time-domain windowedsignal, resulting in a respective complex frequency spectrumrepresentation of the windowed signal for each channel, step 44. For asampling rate of 44.1 kHz and a frame length of 20 ms, the length of theFFT is typically 882. This process results in a set of K frequencycomponents for both input channels (L(k), R(k)).

In the first embodiment, the two input channels representations L(k) andR(k) are first combined by a simple linear summation, step 46. It willbe seen, however, that this could easily be extended to weightedsummation. Thus, for the present embodiment, sum signal S(k) comprises:S(k)=L(k)+R(k)Separately, the frequency components of the input signals L(k) and R(k)are grouped into several frequency bands, preferably usingperceptually-related bandwidths (ERB or BARK scale) and, for eachsubband i, an energy-preserving correction factor m(i) is computed, step45: $\begin{matrix}{\quad{{m^{2}(i)} = {\frac{\sum\limits_{k \in i}\left\{ {{{L(k)}}^{2} + {{R(k)}}^{2}} \right\}}{2\quad{\sum\limits_{k \in i}{{S(k)}}^{2}}} = \frac{\sum\limits_{k \in i}\left\{ {{{L(k)}}^{2} + {{R(k)}}^{2}} \right\}}{2\quad{\sum\limits_{k \in i}{{{L(k)} + {R(k)}}}^{2}}}}}} & {{Equation}\quad 1}\end{matrix}$which can also be written as: $\begin{matrix}\begin{matrix}{{m^{2}(i)} =} \\{\quad{\frac{1}{2}\frac{\sum\limits_{k \in i}\left\{ {{{L(k)}}^{2} + {{R(k)}}^{2}} \right\}}{{\sum\limits_{k \in i}{{L(k)}}^{2}} + {\sum\limits_{k \in i}{{R(k)}}^{2}} + {2{\rho_{LR}(i)}\sqrt{\sum\limits_{k \in i}{{{L(k)}}^{2}{\sum\limits_{k \in i}{{R(k)}}^{2}}}}}}}}\end{matrix} & {{Equation}\quad 2}\end{matrix}$with ρ_(LR)(I) being the (normalized) cross-correlation of the waveformsof subband i, a parameter used elsewhere in parametric multi-channelcoders and so readily available for the calculations of Equation 2. Inany case, step 45 provides a correction factor m(i) for each subband i.

The next step 47 then comprises multiplying the each frequency componentS(k) of the sum signal with a correction filter C(k):S′(k)=S(k)C(k)=C(k)L(k)+C(k)R(k)   Equation 3

It will be seen from the last component of Equation 3 that thecorrection filter can be applied to either the summed signal (S(k) aloneor each input channel (L(k),R(k)). As such, steps 46 and 47 can becombined when the correction factor m(i) is known or performedseparately with the summed signal S(k) being used in the determinationof m(i), as indicated by the hashed line in FIG. 3.

In the preferred embodiments, the correction factors m(i) are used forthe center frequencies of each subband, while for other frequencies, thecorrection factors m(i) are interpolated to provide the correctionfilter C(k) for each frequency component (k) of a subband i. Inprinciple, any interpolation function can be used, however, empiricalresults have shown that a simple linear interpolation scheme suffices,FIG. 4.

Alternatively, an individual correction factor could be derived for eachFFT bin (i.e., subband i corresponds to frequency component k), in whichcase no interpolation is necessary. This method, however, may result ina jagged rather than a smooth frequency behaviour of the correctionfactors which is often undesired due to resulting time-domaindistortions.

In the preferred embodiments, the summation component then takes aninverse FFT of the corrected summed signal S′(k) to obtain a time domainsignal, step 48. By applying overlap-add for successive corrected summedtime domain signals, step 50, the final summed signal s1,s2 . . . iscreated and this is fed through to be encoded, step S9, FIG. 1. It willbe seen that the summed segments s1, s2 . . . correspond to the segmentsm1, m2 . . . in the time domain and as such no loss of synchronisationoccurs as a result of the summation.

It will be seen that where the input channel signals are not overlappingsignals but rather continuous time signals, then the windowing step 42will not be required. Similarly, if the encoding step S9 expects acontinuous time signal rather than an overlapping signal, theoverlap-add step 50 will not be required. Furthermore, it will be seenthat the described method of segmentation and frequency-domaintransformation can also be replaced by other (possibly continuous-time)filterbank-like structures. Here, the input audio signals are fed to arespective set of filters, which collectively provide an instantaneousfrequency spectrum representation for each input audio signal. Thismeans that sequential segments can in fact correspond with single timesamples rather than blocks of samples as in the described embodiments.

It will be seen from Equation 1 that there are circumstances whereparticular frequency components for the left and right channels maycancel out one another or, if they have a negative correlation, they maytend to produce very large correction factor values m²(i) for aparticular band. In such cases, a sign bit could be transmitted toindicate that the sum signal for the component S(k) is:S(k)=L(k)−R(k)with a corresponding subtraction used in equations 1 or 2.

Alternatively, the components for a frequency band i might be rotatedmore into phase with one another by an angle 0(i). The ITD analysisprocess S3 provides the (average) phase difference between (subbands ofthe) input signals L(k) and R(k). Assuming that for a certain frequencyband i the phase difference between the input signals is given by α(i),the input signals L(k) and R(k) can be transformed to two new inputsignals L′(k) and R′(k) prior to summation according to the following:L′(k)=e ^(jcα(i)) L(k)R′(k)=e ^(−j(1−c)α(i)) R(k)with c being a parameter which determines the distribution of phasealignment between the two input channels (0≦c≦1).

In any case, it will be seen that where for example two channels have acorrelation of +1 for a sub-band i, then m²(i) will be ¼ and so m(i)will be ½. Thus, the correction factor C(k) for any component in theband i will tend to preserve the original energy level by tending totake half of each original input signal for the summed signal. However,as can be seen from Equation 1, where a frequency band i of a stereosignal includes spatial properties, the energy of the signal S(k) willtend to get smaller than if they were in phase, while the sum of theenergies of the L,R signals will tend to stay large and so thecorrection factor will tend to be larger for those signals. As such,overall energy levels in the sum signal will still be preserved acrossthe spectrum, in spite of frequency-dependent correlation in the inputsignals.

In a second embodiment, the extension towards multiple (more than two)input channels is shown, combined with possible weighting of the inputchannels mentioned above. The frequency-domain input channels aredenoted by X_(n)(k), for the k-th frequency component of the n-th inputchannel. The frequency components k of these input channels are groupedin frequency bands i. Subsequently, a correction factor m(i) is computedfor subband i as follows:${m^{2}(i)} = \frac{\sum\limits_{n}{\sum\limits_{k \in i}{{{w_{n}(k)}{X_{n}(k)}}}^{2}}}{n\quad{\sum\limits_{k \in i}{{\sum\limits_{n}{{w_{n}(k)}\quad{X_{n}(k)}}}}^{2}}}$

In this equation, w_(n)(k) denote frequency-dependent weighting factorsof the input channels n (which can simply be set to +1 for linearsummation). From these correction factors m(i), a correction filter C(k)is generated by interpolation of the correction factors m(i) asdescribed in the first embodiment. Then the mono output channel S(k) isobtained according to:${S(k)} = {{C(k)}\quad{\sum\limits_{n}{{w_{n}(k)}\quad{X_{n}(k)}}}}$

It will be seen that using the above equations, the weights of thedifferent channels do not necessarily sum to +1, however, the correctionfilter automatically corrects for weights that do not sum to +1 andensures (interpolated) energy preservation in each frequency band.

1. A method of generating a monaural signal (S) comprising a combinationof at least two input audio channels (L, R), comprising the steps of:for each of a plurality of sequential segments (t(n)) of said audiochannels (L,R), summing (46) corresponding frequency components fromrespective frequency spectrum representations for each audio channel(L(k), R(k)) to provide a set of summed frequency components (S(k)) foreach sequential segment; for each of said plurality of sequentialsegments, calculating (45) a correction factor (m(i)) for each of aplurality of frequency bands (i) as function of the energy of thefrequency components of the summed signal in said band$\left( {\sum\limits_{k \in i}{{S(k)}}^{2}} \right)$ and the energy ofsaid frequency components of the input audio channels in said band$\left( {\sum\limits_{k \in i}\left\{ {{{L(k)}}^{2} + {{R(k)}}^{2}} \right\}} \right);$and correcting (47) each summed frequency component as a function of thecorrection factor (m(i)) for the frequency band of said component.
 2. Amethod according to claim 1 further comprising the steps of: providing(42) a respective set of sampled signal values for each of a pluralityof sequential segments for each input audio channel; and for each ofsaid plurality of sequential segments, transforming (44) each of saidset of sampled signal values into the frequency domain to provide saidcomplex frequency spectrum representations of each input audio channel(L(k),R(k)).
 3. A method according to claim 2 wherein the step ofproviding said sets of sampled signal values comprises: for each inputaudio channel, combining overlapping segments (m1,m2) into respectivetime-domain signals representing each channel for a time window (t(n)).4. A method according to claim 1 further comprising the step of: foreach sequential segment, converting (48) said corrected frequencyspectrum representation of said summed signal (S′(k)) into the timedomain.
 5. A method according to claim 4 further comprising the step of:applying overlap-add (50) to successive converted summed signalrepresentations to provide a final summed signal (s1,s2).
 6. A methodaccording to claim 1 wherein two input audio channels are summed andwherein said correction factors (m(i)) are determined according to thefunction:${m^{2}(i)} = {\frac{\sum\limits_{k \in i}\left\{ {{{L(k)}}^{2} + {{R(k)}}^{2}} \right\}}{2\quad{\sum\limits_{k \in i}{{S(k)}}^{2}}} = \frac{\sum\limits_{k \in i}\left\{ {{{L(k)}}^{2} + {{R(k)}}^{2}} \right\}}{2\quad{\sum\limits_{k \in i}{{{L(k)} + {R(k)}}}^{2}}}}$7. A method according to claim 1 wherein two or more input audiochannels (X_(n)) are summed according to the function:${S(k)} = {{C(k)}\quad{\sum\limits_{n}{{w_{n}(k)}\quad{X_{n}(k)}}}}$wherein C(k) is the correction factor for each frequency component andwherein said correction factors (m(i)) for each frequency band aredetermined according to the function:${m^{2}(i)} = \frac{\sum\limits_{n}{\sum\limits_{k \in i}{{{w_{n}(k)}{X_{n}(k)}}}^{2}}}{n\quad{\sum\limits_{k \in i}{{\sum\limits_{n}{{w_{n}(k)}\quad{X_{n}(k)}}}}^{2}}}$wherein w_(n)(k) comprises a frequency-dependent weighting factor foreach input channel.
 8. A method according to claim 7 wherein w_(n)(k)=1for all input audio channels.
 9. A method according to claim 7 whereinw_(n)(k)≠1 for at least some input audio channels.
 10. A methodaccording to claim 7 wherein the correction factor for each frequencycomponent (C(k)) is derived from a linear interpolation of thecorrection factors (m(i)) for at least one band.
 11. A method accordingto claim 1 further comprising the steps of: for each of said pluralityof frequency bands, determining an indicator (α(i)) of the phasedifference between frequency components of said audio channels in asequential segment; and prior to summing corresponding frequencycomponents, transforming the frequency components of at least one ofsaid audio channels as a function of said indicator for the frequencyband of said frequency components.
 12. A method according to claim 11wherein said transforming step comprises operating the followingfunctions on frequency components (L(k), R(k)) of left and right inputaudio channels (L,R):L′(k)=e ^(jcα(i)) L(k)R′(k)=e ^(−j(1−c)α(i)) R(k) wherein 0≦c≦1 determines the distribution ofphase alignment between the said input channels.
 13. A method accordingto claim 1 wherein said correction factor is a function of a sum ofenergy of the frequency components of the summed signal in said band anda sum of the energy of said frequency components of the input audiochannels in said band.
 14. A component (S8′) for generating a monauralsignal from a combination of at least two input audio channels (L, R),comprising: a summer (46) arranged to sum, for each of a plurality ofsequential segments (t(n)) of said audio channels (L,R), correspondingfrequency components from respective frequency spectrum representationsfor each audio channel (L(k), R(k)) to provide a set of summed frequencycomponents (S(k)) for each sequential segment; means for calculating(45) a correction factor (m(i)) for each of a plurality of frequencybands (i) of each of said plurality of sequential segments as functionof the energy of the frequency components of the summed signal in saidband $\left( {\sum\limits_{k \in i}{{S(k)}}^{2}} \right)$ and theenergy of said frequency components of the input audio channels in saidband$\left( {\sum\limits_{k \in i}\left\{ {{{L(k)}}^{2} + {{R(k)}}^{2}} \right\}} \right);$and a correction filter (47) for correcting each summed frequencycomponent as a function of the correction factor (m(i)) for thefrequency band of said component.
 15. An audio coder including thecomponent of claim
 14. 16. Audio system comprising an audio coder asclaimed in claim 15 and a compatible audio player.